Multi-state predictive neural networks for text-independent speaker recognition

نویسندگان

  • Thierry Artières
  • Patrick Gallinari
چکیده

Both Hidden Markov Models and Neural Networks have already been used as production systems for speaker identification or verification. Recently [9] has shown that ergodic multi-state hidden Markov Models do not outperform one-state "hidden" Markov Models, i.e. Gaussian Mixture Models, for speaker recognition. She put in evidence that the important characteristic of these models is the total number of mixtures and not the number of states. These HMMs are thus unable to make use of temporal information for performing speaker recognition. On the other hand, recent experiments have shown that, for neural predictive systems, modelization of non stationarity allowed to significantly improve the performances [6]. We are interested here in the development of such models which will be refereed to as multi-state predictive neural networks (MSPNNs). We study the ability of these systems for speaker identification and discuss the superiority of multi-state upon one-state models. We provide results on 15 talkers from the TIMIT database.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Recognition Using Gaussian Mixtures Models

Speech signal contains several levels of information. At first it contains information about the spoken message. At second level speech signal also gives information about the speaker identity, his emotional state and so on. The task of speaker recognition can be divided into two parts: speaker identification and speaker verification. Speaker identification is answering the question which one o...

متن کامل

Speaker recognition model using two-dimensional mel-cepstrum and predictive neural network

This paper describes a speaker recognition model using TwoDimensional Mel-Cepstrum and predictive neural network. The speaker model consists of two networks. The rst one is a self-organizing VQ map(Kohonen's feature map). The second part is the predictive network and learns transitional patterns on the feature map of each speaker's model. TDMC consists of averaged features and dynamic features ...

متن کامل

Text-Dependent Speaker Recognition Using Emotional Features and Neural Networks

This paper deals with a novel feature extraction method for text dependent speaker recognition. Four female speakers were used to create a text –dependent database for Malayalam (one of the south Indian languages). Discrete Wavelet Transform was used for feature extraction and artificial neural network was used for machine intelligence. In this work we used emotional features for speaker recogn...

متن کامل

Acoustic-phonetic decoding based on elman predictive neural networks

In this paper we present a phoneme recognition system based on the Elman predictive neural networks. The recurrent neural networks are used to predict the observation vectors of speech frames. Recognition of phonemes is done using the prediction error as distortion measure in the Viterbi algorithm. The performance of the neural predictive networks is evaluated on both the training database and ...

متن کامل

Multi-State Time Delay Neural Networks for Continuous Speech Recognition

Alex Waibel Carnegie Mellon University Pittsburgh, PA 15213 [email protected] We present the "Multi-State Time Delay Neural Network" (MS-TDNN) as an extension of the TDNN to robust word recognition. Unlike most other hybrid methods. the MS-TDNN embeds an alignment search procedure into the connectionist architecture. and allows for word level supervision. The resulting system has the ability to ma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995